Los Angeles, CA has a population of 3.98M people with a median age of 35.6 and a median household income of $54,432. Between 2015 and 2016 the population of Los Angeles, CA grew from 3.97M to 3.98M, a 0.11% increase and its median household income grew from $52,024 to $54,432, a 4.63% increase.
Los Angeles County, CA has a population of 10.1M people with a median age of 36.3 and a median household income of $61,338. Between 2015 and 2016 the population of Los Angeles County, CA declined from 10.2M to 10.1M, a 0.32% decrease and its median household income grew from $59,134 to $61,338, a 3.73% increase.
The tech industry provices high paying jobs. Major contributor to the economy. For example, in 2017, while the media wage for all occupations was $37,700 in the U.S., it was $84,600 for the computer occupations, second only to $102,600 of the management occupations. (Mr. William Yu)
2017 Employment by major occupational group, 2016 and projected 2026 (Numbers in thousands) [1]. Tech employees are compensated nicely due to this tech boom. For example, in 207 while the annual medialn wage for all occupations was $37,000 in the U.S., it was $84,600 for the computer occupations, second only to $102,000 of the management occupations as shown in Table 1.
What can we learn from our data scientist uncle? Fig. 2 is user distribution by age. We use a two-color scheme [18] to highlight which age-group won most competitions per user*. However, just a few too many age bins can overwhelm any reader. A way to declutter and structure the bins into usable knowledge is to reduce their numbers and group them in a familiar, relatable form. One way is to group the bins by generations. In this case, we used the Generations in the workforce (the gen X, Y, Z and the Boomers [5]) and we are interested to see which group is the most productive in terms of competitions and cash prizes per user. Because everyone belongs to a generation this chart can become very personable. What can we learn from the wisdom that each generation offers?
Generation year brakets and work-ethic attribute
*Source: Own estimate, xSurvey. Source - Q2 : What is your age (# years)?
Fig. 3 tells a #digitaldivide story. How inclusive are we as a community? Should we pat ourselves on our backs? Again, to create knowledge we need to relate the data to the reader in ways they can connect it to other knowledge they have. Here, one way is to use the income percentiles (see #onepercent). In US, to belong to the 1% elite, one needs to earn more than $422k per year [10]. About 23 respondents declared that they do. In addition, about 6% declared they belong to the 10% percentile, a very inclusive number because 6% is similar to 10%. The 10% percentile income is about $166k in US [11], so if the sample reflects the distribution found in society it means it is at least somehow inclusive. We add a smiley emoji to reassure the reader that yes, this is good.
However, those numbers are for US household incomes. When we look globally, the 1% percentile thereshold is $32k per year. This puts 60% of the respondents in the top 1%. 60% is very different from 1% so globally this datapoint does not support inclusiveness because it does not reflect the global distribution. #Ahamoment. One way to create such moments in the story is to A/Bify the story by switching between two points of view. Source - Q9 : What is your current yearly compensation (approximate $USD)?
Fig. 4. This chart is an example of less is more. Many times, displaying detailed percentages of usage adds little to the story. In this case Sci-Kit has a 48% share, TF has a 16%, followed by Keras 14%. However, anthropomorphizing the ranking with a podium conveys a memorable affordance: the glory the winner deserves for the great utility this library provided to the community. The podium-template is from a #sketchthinking book [12]. Source - Q20 : Of the choices that you selected in the previous question, which ML library have you used the most?
Fig. 5. This is a combination of a famous chart template called Marimekko, with a symbolic chart called House of Shiva [13-14]. Symbolism: The columns, support the visualization efforts of the community. The width of the “columns” expresses how much work/load each column supports. Grey columns on the right represent other less mainstream libs such as: D3, Shiny, bokeh, Leaflet, Lattice.
Source - Q22 : …which specific data visualization library or tool have you used the most?
Fig. 6. Where do new data scientists users come from? 1145 new data sceintist where added in 2018 from more than 100 countries. A typical mistake here is count grouping by country. There are too many countries for a human to make sense of it! It makes much more sense to group them in economic mega regions: US, Europe, BRICS and the rest of World. (The term “BRIC” was coined in 2001 by then-chairman of Goldman Sachs Asset Management,Jim O’Neill [16].) When we do that we see that not only is BRICS the top contributor to growth in data scientists with 42% of total growth for 2018, but it was also the fastest growing among the big three. In 2018, Europe added 302 users that define themselves as data scientist, US: 131, the rest of the world : 231, and BRICS: 481. Regarding growth rates, by 2020 BRICS will outnumber Europe and US combined.
Data source: we forked and modified a snippet the excellent code from [15] and took the top 20 countries whose respondents identified as “data scientist” [17]. EU-6 means the top 6 European countries.
Fig. 7 Shows users per capita using World Bank data of 2016 [21]. We higlight the top country in red and the US (home to the largest community) in black so the reader has a reference point. This chart has a lot going on:
EU gap
The US-EU gap is about 50%. However, the UK mean closer to the EU6 mean than to the US mean. Therefore, Can we discard language barrier as a explanatory factor for the gap?
Assumptions
The x-label Data Scientists per 10,000 is technically ‘respondents per 10,000 residents’, which we take it as an estimate of the data scientist prevalence. In addition, we assume: (i) that the ratio of respondents to data scientists is constant accross borders, (ii) that all respondents are data scientists. The BRICS, and EU6 mean is mean of country means, not weighted by respondents.
Aesthetic considerations
This color scheme is called the red on grey, it is my favorite scheme for charts. Unlike, other schemes such as purple on grey, it is gender neutral [23]. However, for it to work the red surface must be kept to a minimum, otherwise it comes across as strident. The blue on grey scheme does not have this limitation (See Figs. 1-5). However, the red on grey has one secret advantage. Usually, using three colors in a chart will clutter it, but because the chromatic distance between red and any shade of grey is so large, we can get away by using black (as a gray 85%) as a third color with a small clutter trade-off.
Source - World Bank Population Data 2016, Q11 - Current country of residence [20]
## Warning: Removed 33 rows containing non-finite values (stat_smooth).
## Warning: Removed 33 rows containing missing values (geom_point).
Fig. 8 shows the association between the 2012 CHCI and the 2016 median household income of 282 zip codes in L.A. County. We can see a very strong correlation correlation between the human capital level [23] and the household income for the zip code in L.A. It demonstrates the importance to invest in education (chci) in order to create more productive workforce and more high paying jobs.
** Can we forecast How a Country’s innovation will evolve?**
In the chart, some countries (such as Australia) are below the regression line (not shown here). We call these countries data science early adopters - countries
Source - 5-year American Community Survey 2012-2016 Global Innovation Index 2018, World Bank Population Data 2016, Q11 - Current country of residence [19-21]
lacounty_data <- read.csv("./data/lacounty_data.csv")
library(googleVis)## Creating a generic function for 'toJSON' from package 'jsonlite' in package 'googleVis'
##
## Welcome to googleVis version 0.6.2
##
## Please read Google's Terms of Use
## before you start using the package:
## https://developers.google.com/terms/
##
## Note, the plot method of googleVis will by default use
## the standard browser to display its output.
##
## See the googleVis package vignettes for more details,
## or visit http://github.com/mages/googleVis.
##
## To suppress this message use:
## suppressPackageStartupMessages(library(googleVis))
J1 = gvisMotionChart(lacounty_data, idvar="location", timevar="year", xvar = "all_jobs", yvar="tech_jobs",
options=list(width=1500, height=630))
plot(J1)## starting httpd help server ...
## done
#googleWalkout [2] and is bad business - says Forbes [3]. In Fig. 1 (above), we use superhero-themes #batman #wonderwoman to visualize the heavy topic of #gender_equality in #datascience. See a bar chart for a more accurate breakdown [4]. Source: survey question Q1 - What is your gender? Sample size = 23,859 respondents